-
Notifications
You must be signed in to change notification settings - Fork 488
New tool addition: amas tool #7443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
See updated changes from George at jchchiu#1 |
tools/amas/amas.xml
Outdated
| <param name="in_format" type="select" label="Format of the input file"> | ||
| <option value="fasta">fasta</option> | ||
| <option value="phylip">phylip</option> | ||
| <option value="phylip-int">phylip-int</option> | ||
| <option value="nexus">nexus(sequential)</option> | ||
| <option value="nexus-int">nexus(interleaved)</option> | ||
| </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fasta phylip and nexus can be distinguishe automatically, e.g. $input_file.ext gives the Galaxy datatype. Is the info on interleaved/not needed? Can it be determined automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like they have a function that detects the format for interleaved automatically; instead it depends on the input you give it. Can galaxy automatically distinguish interleaved?
tools/amas/amas.xml
Outdated
| </collection> | ||
|
|
||
| <collection name="converted_alignments" type="list" label="Converted alignments"> | ||
| <discover_datasets directory="run_dir/convert" pattern="(?P<name>.+)-out\..+" format="data" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set the format instead if format="data"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at the amas_split.xml at L55; is this what you were thinking?
|
Hey @bernt-matthias, could you have a look at amas_concat.xml and see if this is on the right track? If so, I'll update the rest of the subcommands with your suggestions. |
| concat | ||
| --concat-part partitions.txt | ||
| --concat-out concatenated.out | ||
| --part-format $part_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can determine the input format from $input_files.ext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment also relevant to #7443 (comment)
The problem I have with this is that if it is a nexus or phylip file, their extension doesn't always tell whether it is an interleaved or sequential format. Even if you sniff it as an interleaved does $input_files.ext return phlyip-int or something like that which differentiates it from normal phylip? Otherwise I'm pretty sure amas needs the user to explicitly set the file format as an input.
What are you thoughts on only taking non-interleaved formats, and give a warning to the user that it will not accept interleaved in the help or something? Following this also removing the option to output it as an interleaved file? Problem I see with this is that they can still upload an interleaved file since they have the same extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to leave it for now work on an extension of the datatypes. Do you think the inserleaved / sequential datatypes are well defined? Are you interested in working on this. I could try to give you some pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could also implement a small helper script that checks if the data is interleaved. Seems rather trivial.
For the output my suggestion would be a boolean interleaved: yes/no. Plus a select: as input / phylip / nexus / fasta?
tools/amas/amas_concat.xml
Outdated
| help="A file defining how the concatenated alignment is split into separate gene/locus regions. Each line specifies a partition name and its position range (e.g., 'gene1 = 1-500' or 'DNA, gene1 = 1-500' for RAxML format)."> | ||
| <option value="nexus">nexus</option> | ||
| <option value="raxml">raxml</option> | ||
| <option value="unspecified" selected="true">unspecified</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in the unspecified case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just has the genes and their start and end; should I add more context or just direct them to the help section?
- Unspecified:
gene1 = 1-500 - RAxML:
DNA, gene1 = 1-500 - NEXUS:
#NEXUS
Begin sets;
charset gene1 = 1-500;
End;
…d some formatting
|
I've been testing the split subcommand again and it seems like AMAS doesn't work when you use a RAxML or NEXUS formatted partitions file as an input. The regex operator only works for the unspecified partitions: I've updated the subcommand accordingly with some more info. |
…titions; removed with note and more info
FOR CONTRIBUTOR:
Regarding issue #7442
To do:
For now, is it possible to review the code to see if it's on the right track, and if there are any better ways to structure it?